Future work sentences (FWS) are the particular sentences in academic papers that contain the author's description of their proposed follow-up research direction. This paper presents methods to automatically extract FWS from academic papers and classify them according to the different future directions embodied in the paper's content. FWS recognition methods will enable subsequent researchers to locate future work sentences more accurately and quickly and reduce the time and cost of acquiring the corpus. The current work on automatic identification of future work sentences is relatively small, and the existing research cannot accurately identify FWS from academic papers, and thus cannot conduct data mining on a large scale. Furthermore, there are many aspects to the content of future work, and the subdivision of the content is conducive to the analysis of specific development directions. In this paper, Nature Language Processing (NLP) is used as a case study, and FWS are extracted from academic papers and classified into different types. We manually build an annotated corpus with six different types of FWS. Then, automatic recognition and classification of FWS are implemented using machine learning models, and the performance of these models is compared based on the evaluation metrics. The results show that the Bernoulli Bayesian model has the best performance in the automatic recognition task, with the Macro F1 reaching 90.73%, and the SCIBERT model has the best performance in the automatic classification task, with the weighted average F1 reaching 72.63%. Finally, we extract keywords from FWS and gain a deep understanding of the key content described in FWS, and we also demonstrate that content determination in FWS will be reflected in the subsequent research work by measuring the similarity between future work sentences and the abstracts.
translated by 谷歌翻译
Users' involvement in creating and propagating news is a vital aspect of fake news detection in online social networks. Intuitively, credible users are more likely to share trustworthy news, while untrusted users have a higher probability of spreading untrustworthy news. In this paper, we construct a dual-layer graph (i.e., the news layer and the user layer) to extract multiple relations of news and users in social networks to derive rich information for detecting fake news. Based on the dual-layer graph, we propose a fake news detection model named Us-DeFake. It learns the propagation features of news in the news layer and the interaction features of users in the user layer. Through the inter-layer in the graph, Us-DeFake fuses the user signals that contain credibility information into the news features, to provide distinctive user-aware embeddings of news for fake news detection. The training process conducts on multiple dual-layer subgraphs obtained by a graph sampler to scale Us-DeFake in large scale social networks. Extensive experiments on real-world datasets illustrate the superiority of Us-DeFake which outperforms all baselines, and the users' credibility signals learned by interaction relation can notably improve the performance of our model.
translated by 谷歌翻译
The application of Natural Language Processing (NLP) to specialized domains, such as the law, has recently received a surge of interest. As many legal services rely on processing and analyzing large collections of documents, automating such tasks with NLP tools emerges as a key challenge. Many popular language models, such as BERT or RoBERTa, are general-purpose models, which have limitations on processing specialized legal terminology and syntax. In addition, legal documents may contain specialized vocabulary from other domains, such as medical terminology in personal injury text. Here, we propose LegalRelectra, a legal-domain language model that is trained on mixed-domain legal and medical corpora. We show that our model improves over general-domain and single-domain medical and legal language models when processing mixed-domain (personal injury) text. Our training architecture implements the Electra framework, but utilizes Reformer instead of BERT for its generator and discriminator. We show that this improves the model's performance on processing long passages and results in better long-range text comprehension.
translated by 谷歌翻译
Audio-visual speech recognition (AVSR) has gained remarkable success for ameliorating the noise-robustness of speech recognition. Mainstream methods focus on fusing audio and visual inputs to obtain modality-invariant representations. However, such representations are prone to over-reliance on audio modality as it is much easier to recognize than video modality in clean conditions. As a result, the AVSR model underestimates the importance of visual stream in face of noise corruption. To this end, we leverage visual modality-specific representations to provide stable complementary information for the AVSR task. Specifically, we propose a reinforcement learning (RL) based framework called MSRL, where the agent dynamically harmonizes modality-invariant and modality-specific representations in the auto-regressive decoding process. We customize a reward function directly related to task-specific metrics (i.e., word error rate), which encourages the MSRL to effectively explore the optimal integration strategy. Experimental results on the LRS3 dataset show that the proposed method achieves state-of-the-art in both clean and various noisy conditions. Furthermore, we demonstrate the better generality of MSRL system than other baselines when test set contains unseen noises.
translated by 谷歌翻译
链接的语音实体旨在识别和消除语言中的命名实体。常规方法严重遭受了不受限制的语音样式和ASR系统产生的嘈杂笔录。在本文中,我们提出了一种名为“知识增强命名实体识别”(KENER)的新颖方法,该方法致力于通过在实体识别阶段无痛地纳入适当的知识来改善鲁棒性,从而改善实体联系的整体性能。肯纳(Kener)首先检索未提及的句子的候选实体,然后利用实体描述作为额外的信息来帮助识别提及。当输入短或嘈杂时,由密集检索模块检索的候选实体特别有用。此外,我们研究了各种数据采样策略和设计有效的损失功能,以提高识别和歧义阶段中检索实体的质量。最后,将与过滤模块的链接作为最终保障措施应用,从而可以过滤出错误认可的提及。我们的系统在NLPCC-2022共享任务2的轨道1中获得第一名,并在轨道1中获得第一名。
translated by 谷歌翻译
由于生成对抗网络(GAN)的突破,3D可控制的肖像合成已大大提高。但是,用精确的3D控制操纵现有的面部图像仍然具有挑战性。虽然连接gan倒置和3D感知,但噪声到图像是一种直接的解决方案,但它效率低下,可能导致编辑质量明显下降。为了填补这一空白,我们提出了3D-FM GAN,这是一个专门为3D可控制的面部操作设计的新型有条件GAN框架,并且在端到端学习阶段后不需要任何调整。通过小心地编码输入面图像和3D编辑的基于物理的渲染,我们的图像生成器提供了高质量,具有身份的3D控制面部操纵。为了有效地学习这种新颖的框架,我们制定了两种基本的训练策略和一种新颖的乘法共同调制体系结构,可在天真的方案上显着改善。通过广泛的评估,我们表明我们的方法在各种任务上的表现优于先前的艺术,具有更好的编辑性,更强的身份保存和更高的照片真实性。此外,我们在大型姿势编辑和室外图像上展示了设计更好的概括性。
translated by 谷歌翻译
早期发现焦虑症对于减少精神障碍患者的苦难并改善治疗结果至关重要。基于MHealth平台的焦虑筛查在提高筛选效率和降低筛查成本方面具有特殊实用价值。实际上,受试者的身体和心理评估中移动设备的差异以及数据质量不均匀的问题和现实世界中数据的少量数据量使现有方法无效。因此,我们提出了一个基于时空特征融合的框架,用于非触发焦虑。为了降低数据质量不平衡的影响,我们构建了一个基于“ 3DCNN+LSTM”的特征提取网络,并融合了面部行为和非接触式生理学的时空特征。此外,我们设计了一种相似性评估策略,以解决较小的数据样本量导致模型准确性下降的问题。我们的框架已通过现实世界中的机组数据集进行了验证,并且两个公共数据集UBFC-Phys和Swell-KW。实验结果表明,我们框架的总体性能要比最新的比较方法更好。
translated by 谷歌翻译
强化学习(RL)技术在许多具有挑战性的任务中引起了极大的关注,但是当应用于现实世界问题时,它们的性能急剧恶化。已经提出了各种方法,例如域随机化,以通过不同的环境设置下的培训代理来应对这种情况,因此在部署过程中可以将它们推广到不同的环境。但是,它们通常不包含与代理人正确相互作用的潜在环境因素信息,因此在面对周围环境变化时可能会过于保守。在本文中,我们首先将适应RL中的环境动态的任务形式化为使用上下文Markov决策过程(CMDP)的概括问题。然后,我们在上下文RL(AACC)中提出了不对称的参与者 - 作为处理此类概括任务的端到端参与者的方法。我们在一系列模拟环境中证明了AACC对现有基线的性能的基本改进。
translated by 谷歌翻译
通过开发基于生成的自我监督学习(SSL)方法,例如Beit和Mae,如何通过掩盖输入图像的随机补丁并重建缺失信息来学习良好的表示形式。但是,Beit和Peco需要一个“预先陈述”阶段,以生成用于掩盖补丁代表的离散代码手册。 MAE不需要预训练的代码簿流程,但是将像素设置为重建目标可能会引入前训练和下游任务之间的优化差距,即良好的重建质量可能并不总是会导致模型的高描述能力。考虑到上述问题,在本文中,我们提出了一个简单的自鉴定的蒙面自动编码器网络,即SDAE。 SDAE由一个使用编码器解码器结构的学生分支组成,以重建缺失的信息,并制作一个师范分支,生产蒙版代币的潜在表示。我们还分析了如何从信息瓶颈的角度来为教师分支机构建立潜在代表性的好看法。之后,我们提出了一种多重掩蔽策略,以提供多个掩盖视图,并具有平衡的信息以提高性能,这也可以降低计算复杂性。我们的方法很好地概括了:只有300个时期预训练,香草vit-base模型在Imagenet-1K分类上达到了84.1%的微调精度,48.6 MIOU在ADE20K细分方面和48.9 coco检测中的MAP,它超过了其他方法,从而超过其他方法。通过相当大的边距。代码可从https://github.com/abrahamyabo/sdae获得。
translated by 谷歌翻译
本文介绍了我们对第四次情感行为分析(ABAW)竞争的多任务学习(MTL)挑战的提交。基于视觉功能表示,我们利用三种类型的时间编码器来捕获视频中的时间上下文信息,包括基于变压器的编码器,基于LSTM的编码器和基于GRU的编码器。使用时间上下文感知表示,我们采用多任务框架来预测图像的价,唤醒,表达和AU值。此外,将平滑处理用于完善初始价和唤醒预测,并使用模型集成策略来结合不同模型设置的多个结果。我们的系统在MTL挑战验证数据集上实现了$ 1.742 $的性能。
translated by 谷歌翻译